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Comparison  of  IRT  Observed- Score  and  True-Score  'Equa tings' 

Abstract 

Two  methods  of  'equating'  tests  using  item  response  theory  are 
compared,  one  using  true  scores,  the  other  using  the  estimated  distri¬ 
bution  of  observed  scores.  On  the  data  studied,  they  yield  almost 
indistinguishable  results.  This  is  a  reassuring  result  for  users  of 
IRT  equating  methods. 
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Comparison  of  IRT  Observed-Score  and  True-Score  'Equatings'* 

Most  IRT  equating  is  currently  attempted  by  the  true-score  equating 
procedure  described  in  Lord  (1980,  Chapter  13).  Lord  also  describes 
an  IRT  observed-score  procedure,  which  until  now  seems  not  to  have  been 
further  investigated,  perhaps  because  it  is  more  complicated  and  more 
expensive  than  the  true-score  procedure.  The  present  article  reports 
an  empirical  research  study  comparing  the  results  of  applying  these  two 
procedures  to  real  test  data. 

Sections  1  and  2  outline  the  true-score  and  the  observed-score 
procedures,  respectively.  Section  3  discusses  the  theoretical 
advantages  and  disadvantages  of  each  procedure.  Section  4  describes 
the  real  test  data  used  to  provide  a  comparison  of  the  two  methods. 

Section  5  describes  the  procedures  for  estimating  item  and  ability 
parameters.  Section  6  reports  and  summarizes  the  empirical  results. 

Item  response  theory  models  the  probability  of  a  correct  response 
by  an  examinee  to  a  test  item  as  a  monotonically  increasing  function  of 
ability.  The  model  used  here  is  Birnbaum's  three-parameter  logistic 
model  given  by  the  following  formula: 

*This  work  was  supported  in  part  by  contract  N00014-80-C-0402 , 
project  designation  NR  150-453  between  the  Office  of  Naval  Research  and 
Educational  Testing  Service.  Reproduction  in  whole  or  in  part  in 
permitted  for  any  purpose  of  the  United  States  Government. 
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W  "  ci  +  (1  "  ci)/(1  +  exp(-1.7a1(0a  -  b1))) 


where  P^(0a)  is  the  probability  of  examinee  a  getting  item  i 
correct 

b^  is  the  difficulty  of  item  i  ; 

a^  is  the  discrimination  index  for  item  1  ; 

c^  is  the  lower  asymptote  for  item  i  ; 

8g  is  the  ability  of  examinee  a  (-*»<0  <«). 

has  a  minimum  of  c^  and  a  maximum  of  1.  This  model  assumes 
that  the  test  is  unidimensional. 


1 .  True-Score  Equating 


Since  the  expected  score  of  examinee  a  on  item  i  is  P^(0a)  ,  the 

examinee's  expected  number  of  right  answers  is  £  P  (0  )  .  In  classical 

i  1  A 

test  theory,  this  expectation  is  called  the  (number-right)  true 

score ,  C  =  E.P.(8)  .  For  the  moment,  we  do  not  deal  with  the  scores 

A  11  A 

of  particular  examinees,  so  the  subscript  a  will  be  dropped.  Here 
the  true  score  for  test  X  containing  n  items  is  the  mathematical 
variable 


5  -  Z  ?AB) 
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a  monotonic  increasing  function  of  0  .  If  test  Y  contains  m  items 
and  measures  the  same  ability  8  as  test  X  ,  the  true  score  on  test 
Y  is  the  mathematical  variable 


n  =  l  P. (8 )  .  (3) 

j-1  J 

The  variables  £  ,  n  ,  8  are  all  measures  of  the  same  psycholog¬ 

ical  trait,  they  differ  only  in  the  numerical  scale  on  which  the 

measurements  are  expressed.  Thus  true  scores  £  *  £  and  n  “  n 

o  o 

corresponding  to  any  given  8  -  8q  represent  Identical  levels  of 
ability.  Any  examinee  whose  true  score  on  test  X  is  £q  must 
automatically  have  a  true  score  on  test  Y  of  exactly  nQ  ,  provided 
the  IRT  model  holds.  The  situation  is  the  same  as  when  we  say  that  32° 
Farenhelt  has  the  same  meaning  as  0°  Celsius,  except  that  these 
temperature  scales  have  a  linear  relationship,  whereas  the  true-score 
scales  have  a  nonlinear  relationship.  Thus,  £q  and  nQ  are  equated 
true  scores;  this  is  true  in  a  much  stronger  sense  than  is  usually 
implied  by  the  term  equated. 

In  IRT  true-score  equating,  estimated  item  parameters  are  sub¬ 
stituted  into  (2)  and  (3)  and  a  table  of  corresponding  values  of  £ 
and  n  is  calculated.  This  constitutes  the  true-score  equating 
table.  This  table  is  then  applied  in  practice  as  if  the  true  scores 
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were  observed  number-right  scores.  Since  observed  scores  have  different 
properties  than  true  scores,  this  last  step  has  no  clear  theoretical 
justification.  It  is  done  as  a  practical  procedure,  to  be  justified 
only  by  whatever  usefulness  and  reasonableness  can  be  empirically 
demonstrated  for  the  results. 

2.  IRT  Observed-Score  Equating 

If  the  assumptions  of  IRT  hold  (as  is  assumed  throughout),  the 
probability  that  an  examinee  of  ability  8  will  have  a  number- 
right  score  of  x  -  1  on  a  two-item  test  is  ,  where 

P^  =  Pa(9)  and  =  1  -  P^  .  The  probability  that  this  examinee's 
score  is  0  or  1  is  Qj^  or  P^  respectively.  These  probabilities 
constitute  the  conditional  frequency  distribution  f2(x|8)  . 

If  a  third  item  is  added  to  this  test,  the  distribution  of  x 
is  now 

f3(x|8)  -  Q3f2(x|8)  +  P3f2(x  -  1 1 6 )  (  x  -  0,1 . 3  )  . 

where  fr(x|8)«0  if  x<0  or  x>r.  Using  this  recursive 
procedure,  a  computer  can  readily  determine  fr(x|8)  ,  even  for  an  n 
of  several  hundred. 

If  the  8  of  each  examinee  is  known,  the  (marginal)  distribution 
of  x  for  a  group  of  N  examinees  is 
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W  j,  fn(l|9.> 


If  an  m  -item  test  Y  yields  number-right  score  y  and  measures  the 
same  ability  as  test  X  ,  then  the  (marginal)  distribution  of  y  for 

a  group  of  M  examinees  is 


5  *  V^'V  • 

b-1 


A  monotonic  transformation  of  the  y  scores  can  now  be  found  from  (4) 
and  (5)  such  that  the  distribution  of  the  transformed  y  scores  is  the 
same  as  the  distribution  of  the  (untransformed)  x  scores,  except  for 
irregularities  due  to  the  fact  that  x  and  y  can  only  assume  integer 
values.  This  is  done  by  finding,  for  each  y  score,  the  x  that  has 
the  same  percentile  rank  in  (4)  that  y  has  in  (5).  The  x  so  found 
is  the  desired  transformed  y  score. 

If  the  examinees  who  took  test  Y  have  the  same  distribution  of 
8  as  the  examinees  who  took  test  X  ,  then  the  resulting  transformation 
of  y  is  an  'equl per cent lie  equating'  of  the  y  scale  to  the  x 
scale .  Within  groups  similar  to  the  groups  used  to  derive  the 
transformation ,  it  has  the  valuable  property  that  if  a  cutting  score 
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is  chosen  on  the  x  scale  and  the  same  cutting  score  is  used  on  the 
transformed  y  scale,  the  proportion  of  test  X  examinees  selected 
will  be  the  same  as  the  proportion  of  test  Y  examinees  selected.  This 
property  is  essential  if  test  X  and  test  Y  examinees  are  both  to  be 
treated  equitably,  so  that  an  examinee  cannot  complain  that  he  was 
injured  by  the  choice  of  test  administered. 

When  the  groups  taking  tests  X  and  Y  are  known  to  have 
approximately  the  same  distribution  of  6  (for  example,  they  are  two 
random  samples  from  the  same  population),  there  is  no  reason  to  use  IRT. 

It  is  much  simpler  to  do  the  equipercentile  equating  using  the  actual 
sample  distributions  of  x  and  y  ,  instead  of  (4)  and  (5).  The  need 
for  IRT  arises  when  the  ability  distributions  of  the  two  groups  may  dif¬ 
fer.  In  this  case,  IRT  may  allow  us  to  estimate  the  (marginal)  fre¬ 
quency  distributions  of  number-right  scores  that  would  have  resulted 
if  all  examinees  had  taken  both  tests,  without  practice  or  fatigue 
effects. 

In  order  to  do  this,  the  item  and  ability  parameters  in  (4)  and  (5) 
must  all  be  on  the  same  scale.  This  is  usually  accomplished  by 
administering  a  suitable  'anchor  test'  to  both  groups  of  examinees.  All 
answer-sheet  responses  for  both  groups  are  used  in  a  single  computer 
run  that  estimates  all  parameters  on  the  same  scale.  These  estimates 
are  then  used  in  (4)  and  (5),  substituting  N  +  M  for  N  or  M  ,  to 
obtain  the  distributions  of  x  and  y  for  the  combined  group  of  N  +  M 
examinees.  Equipercentile  equating  of  y  to  x  is  then  carried  out 
In  the  usual  way. 
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3.  Theoretical  Perspectives 

Practical  workers,  with  the  need  for  equating  scores  on  two 
different  test  forms,  have  over  the  years  used  widely  different  methods 
(see  Angoff,  1971)  in  an  attempt  to  approximate  the  desired  result. 

Each  practical  worker,  needing  a  word  to  describe  his  results,  asserts 
that  he  has  produced  an  equating  of  y  to  x  .  Yet  different  methods 
and  different  groups  do  not  produce  Identical  'equatings'. 

Braun  and  Holland  (1982,  page  14)  state:  "There  is  some 
disagreement  over  what  test  equating  is  and  the  proper  method  for  doing 
it."  They  then  adopt  the  definition  "Form-X  and  Form-Y  are  equated  on 
[population]  P  "  if  the  distribution  of  the  transformed  y  scores  in 
population  P  is  the  same  as  Che  distribution  of  the  (untransformed)  x 
scores. 

This  definition  of  the  phrase  'equated  on  population  P  '  is 
beyond  reproach.  One  problem,  however,  is  that  the  qualifying  phrase 
'on  population  P  '  is  typically  dropped  by  the  practical  worker  who 
writes  a  research  report  or  publishes  an  equating  table  in  a  test 
manual. 

Unfortunately  (as  will  be  shown  later  in  this  section)  two  tests 
that  are  equated  on  population  P  will  typically  not  be  equated  for 
various  subpopulations  that  are  included  in  P  .  Test  scores  that  are 
equated  for  the  population  of  college  applicants  may  well  be  equated 
neither  for  the  population  of  female  college  applicants,  nor  for  the 
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the  population  of  male  college  applicants.  The  scores  are  still  less 
likely  to  be  equated  for  a  subpopulation  characterized  by  interest  in 
science,  or  in  music.  For  the  subpopulation  of  Harvard  applicants,  the 
situation  is  much  worse. 

If  the  proportion  of  applicants  admitted  to  Harvard  differs 
significantly  depending  on  whether  they  were  given  form  X  or 
form  Y  of  the  test,  it  is  clear  that  the  'equating'  was  unsuccessful. 
Since  similar  inequalities  are  likely  to  characterize  any  equating 
on  any  specified  population,  it  may  be  best  not  to  say  that  the  tests 
are  'equated'  at  all,  or  to  simply  say  that  they  are  'approximately 
equated. ' 

From  a  practical  point  of  view,  the  approximation  may  be  quite 
satisfactory  for  many  subgroups.  It  is  unlikely,  however,  that  the 
equating  will  be  adequate  for  any  subpopulation  having  a  mean  and 
variance  of  ability  that  is  sharply  different  from  the  mean  and  variance 
of  the  total  population  used  to  derive  the  equating  transformation. 
Extensive  practical  data  illustrating  the  adequacies  and  the 
inadequacies  of  approximate  equatings  are  given  in  the  30-volume 
Anchor  Test  Study  (Loret,  Seder,  Bianchini,  and  Vale,  1974). 

For  a  theoretical  discussion  of  alternative  equating  methods, 
however,  it  is  important  not  to  start  out  with  a  definition  of  equating 
that  is  clearly  inadequate  for  subpopulations  of  examinees.  Given  that 
the  IRT  model  holds,  IRT  observed-score  equating  would,  for  example,  be 
automatically  endorsed  by  the  Braun  and  Holland  definition,  since  their 
definition  mandates  equipercentile  equating.  IRT  true-score  equating 
would  be  definitely  rejected  by  their  definition,  since  in  general  it 
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will  not  lead  to  x  scores  and  transformed  y  scores  having  the  same 
frequency  distribution,  unless  X  and  Y  are  strictly  parallel  forms 
that  are  identical  in  difficulty,  in  reliability,  and  also  in  most  other 
respects. 

The  important  virtue  of  IRT  true-score  equating  is  that  if  the  IRT 
model  holds,  the  true  scores  are  clearly  equated  for  all  subpopulations 
of  examinees.  This  results  from  the  invariance  of  IRT  parameters  across 
populations  of  examinees,  assumed  by  the  IRT  model.  The  clear  flaw  in 
IRT  true-score  equating  is  that  it  equates  true  scores,  not  the 
actually  observed  fallible  scores.  Treating  observed  scores  as  if  they 
were  true  scores  cannot  be  justified  on  any  theoretical  grounds. 

The  virtue  of  IRT  observed-score  equating  is  that  in  a  group  like 
that  used  to  derive  the  equating,  any  cutting  score  will  accept  the 
same  percentage  of  examinees  regardless  of  the  test  administered.  The 
flaw  is  that  this  holds  only  for  that  total  group  and  not  for  other 
groups  or  subgroups. 

This  last  statement  is  most  clearly  seen  from  a  very  extreme 
example.  Suppose  forms  X  and  Y  have  the  same  number  of  items, 
measure  the  same  ability  6  ,  but  differ  in  difficulty.  If  the 
equipercentile  equating  is  carried  out  on  a  group  of  examinees  all 
of  whom  are  guessing  at  random  on  almost  all  the  items,  the  difference 
in  difficulty  between  the  two  forms  will  not  manifest  itself  and  any 
equipercentile  equating  will  approximate  an  identity  transformation  of 
score  y  .  If  a  slightly  more  competent  group  of  examinees  is  used  for 
the  equipercentile  equating,  however,  the  difference  in  difficulty 
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between  forms  will  begin  to  become  apparent  and  most  y  scores  will  be 
adjusted  upwards  or  downwards  accordingly.  As  the  competence  of  the 
group  used  becomes  higher  and  higher,  the  equating  transformation  found 
will  differ  more  and  more  from  the  identity  transformation  found  from 
the  original  extreme  group. 

As  a  second  example  of  the  inescapable  invalidity  of  observed- 
score  equating,  suppose  that  tests  X  and  Y  are  of  equal  difficulty 
and  that  the  true  scores  £  and  n  have  equal  variance,  but  that 
y  is  much  less  reliable  than  x  .  Consider  a  subgroup  of  very 
talented  examinees;  to  make  the  illustration  clear,  consider  that  in 
this  subgroup  all  examinees  have  nearly  identical  6  values.  Most  of 
the  variation  in  observed  scores  x  and  y  is  now  due  to  errors  of 
measurement.  The  equipercentile  equating  transformation  found  will  thus 
approximate  a  straight  line  with  slope 

standard  deviation  of  the  errors  of  measurement  in  x 

standard  deviation  of  the  errors  of  measurement  in  y 

Since  y  is  much  less  reliable  than  x  ,  the  slope  will  be  much 
less  than  1. 

If,  on  the  other  hand,  the  equipercentile  equating  transformation 
is  found  from  a  group  where  the  true-score  variance  is  large  compared  to 
the  errors  variances,  the  transformation  will  tend  to  approximate  a 
straight  line  with  slope 


standard  deviation  of  true  scores  on  x 
standard  deviation  of  true  scores  on  y 
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Intermediate  situations  will  provide  transformations  with  intermediate 
slopes.  If  the  wrong  equating  is  applied  to  any  given  subpopulation, 
the  population  of  examinees  in  the  subpopulation  accepted  will  depend 
on  whether  they  took  test  X  or  test  Y  ,  an  inequitable  result. 

Our  theoretical  position,  then,  is  that  each  method  described  in 
Section  2  (as  well  as  all  other  available  equating  methods)  has  its 
own  Inadequacies.  Since,  in  practice,  some  (approximate)  equating 
method  must  be  used,  it  will  be  informative  to  investigate  empirically 
how  the  two  methods  of  Section  2  compare  in  a  specially  contrived 
practical  situation  where  the  correct  equating  is  actually  known  in 
advance . 


4.  Data 

These  two  equating  methods  were  used  to  equate  the  chain  of  six  SAT 
verbal  tests  described  by  Petersen,  Cook  and  Stocking  in  the  report 
IRT  Versus  Conventional  Equating  Methods :  A  Comparative  Study  of  Scale 
Stability.  The  tests  in  this  chain  were  selected  such  that  the  first 
test  and  the  last  test  are  the  same.  Each  test  is  equated  to  the  next 

test  in  the  chain  using  an  anchor  test.  Figure  1  is  a  diagram  of  the 

chain.  The  capital  letters  represent  the  test  form,  the  small  letters 
represent  the  anchor  test.  Scores  on  form  V4  are  equated  to  scores  on 
form  X2  using  the  anchor  test  fe  .  These  equated  scores  on  X2  are 
equated  to  scores  on  form  Y3  using  the  anchor  test  fm  .  This  gives 

us  an  equating  of  form  V4  to  Y3  .  In  this  manner,  one  proceeds 

through  the  chain,  with  the  final  equating  of  Z5  to  V4  giving  us  a 
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Figure  1*  Chain  of  six  SAT  verbal  equatings.  Upper  case  letters 
designate  test  forms;  lower  case  letters  designate 
anchor  tests. 
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table  of  scores  on  the  original  V4  equated  to  the  scores  on  the  V4  at 
the  end  of  the  chain.  Any  deviation  from  equality  between  the  two  sets 
of  scores  could  be  attributable  to  scale  drift  or  lack  of  model  fit. 

Each  form  in  the  chain  has  85  items  except  form  V4  which  has  90 
items.  Each  anchor  test  has  40  items.  For  each  form  there  are  two 
samples  of  examinees;  each  sample  taking  a  different  anchor  test.  The  two 
groups  taking  each  form  were  random  samples  from  the  same  population  for 
all  of  the  forms  except  Y3  .  For  the  parameter  estimation  runs  a 
random  sample  of  approximately  2670  examinees  was  selected  from  the  data 
obtained  at  the  test  administration  of  that  form  and  anchor  test. 

5.  Parameter  Calibration 

The  item  parameters  and  abilities  were  estimated  by  a  modified 
version  of  the  computer  program  LOG I ST ,  (Hood,  Wlngersky,  &  Lord,  1976) 
in  six  separate  calibration  runs.  In  Figure  1,  each  box  (containing  two 
forms  and  one  anchor  test)  represents  one  LOGIST  run.  The  item 
responses  for  items  not  taken  by  an  examinee,  such  as  the  X2  items  for 
examinees  taking  form  V4  in  box  1,  are  treated  as  not  reached  items. 

All  of  the  estimated  parameters  within  each  LOGIST  run  are  on  the 
same  scale  and  either  method  of  equating  can  be  used  to  equate  the 
scores  for  the  two  tests.  The  anchor  tests  are  not  used  directly  in  the 
equating  but  are  used  in  LOGIST  so  that  the  estimated  parameters 
within  a  LOGIST  run  are  on  the  same  scale. 
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6.  Results 

In  using  the  IRT  observed-score  equating  method,  two  estimated  dis¬ 
tributions  of  observed  scores  are  equated  so  that  the  transformed  y 
scores  and  the  (untransformed)  x  scores  have  the  same  distribution. 
Figure  2  is  presented  to  demonstrate,  that  at  least  for  one  set  of  data, 
this  estimated  distribution  of  observed  scores  is  a  reasonable  fit  to 
the  actual  distribution  of  observed  scores.  The  frequencies  are 
plotted  against  formula  scores  which  are  the  number  right  minus  a 
fraction  of  the  number  wrong.  The  fraction  is  one  over  the  number  of 
choices.  Since  the  estimated  observed-score  distribution  can  only  be 
obtained  for  number-right  scores,  the  transformation  to  formula  scores 
assumes  that  there  are  no  omits,  that  is,  that  the  number  wrong  is  the 
total  number  of  items  minus  the  number  right.  In  order  to  compare  the 
two  distributions,  the  observed-score  distribution  should  be  based  on  a 
group  that  has  no  omits.  Consequently,  a  form  of  the  SAT  verbal 
different  from  the  ones  in  the  chain  was  used  for  this  Figure  in  order 
to  get  a  sufficiently  large  enough  sample  for  the  frequency  distribution 
and  for  the  item  calibration. 

The  agreement  shown  in  Figure  2  is  good  except  that  the  tails  of 
the  estimated  distribution  are  too  high.  This  discrepancy  is  presumably 
due  to  the  use  of  estimated  6  in  place  of  true  6  for  the 
practical  implementation  of  (A).  Since  a  similar  discrepancy  affects 
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the  estimated  observed-score  distributions  of  both  test  X  and  test 
T  ,  the  effects  of  the  discrepancies  tend  to  cancel  out  In  the  equating 
process. 

In  our  chain-equating  study,  each  method  of  equating  was  applied 

separately  to  the  whole  chain  of  equatlngs,  resulting  In  a  line  for  each 

method  equating  form  V4  at  the  beginning  of  the  chain  to  form  V4  at  the 

end  of  the  chain.  These  two  lines  are  plotted  In  Figure  3  along  with  a 

45°  line.  The  solid  line  is  the  IRT  true-score  equating  line;  the 

dotted  line,  falling  practically  on  top  of  the  solid  line.  Is  the  IRT 

observed-score  equating  line.  To  equate  scores  below  chance  level, 
n 

that  is  I  c.  ,  for  the  IRT  true-score  line,  the  method  given  on 
1-1 

pages  210-211  of  Lord  (1980)  was  used.  For  scores  above  0,  the  maximum 
difference  between  the  two  equatlngs  was  .2;  for  scores  below  0,  the 
maximum  difference  was  .8  which  occurred  at  the  chance  level.  If  the 
equating  methods  were  perfect  and  there  were  no  scale  drift,  the 
equating  line  would  be  the  dashed  45s  line. 

Figure  4  shows  the  two  equating  methods  applied  to  one  individual 
link  In  the  chain.  This  particular  link  was  selected  because  the  IRT 
true-score  equating  line  between  these  two  forms  had  the  greatest 
discontinuity  In  the  slope  at  the  chance  level.  The  largest  difference 
between  the  two  lines  occurred  at  the  chance  level  and  was  1.6.  For 
scores  above  0  the  maximum  difference  between  the  two  lines  was  .4. 
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Given  that  there  is  no  clear  theoretical  justification  for  applying 
IRT  true-score  equating  to  observed  scores  and  that  the  equipercentile 
equating  of  the  IRT  observed-score  distributions  is  population 
dependent,  the  close  agreement  between  the  two  lines  is  reassuring. 
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Armidale,  New  South  Wales  2351 
AUSTRALIA 

Dr.  Dexter  Fletcher 
WICAT  Research  Institute 
1875  S.  State  Street 
Orem,  UT  22333 

Dr.  John  R.  Frederiksen 
Bolt,  Beranek,  and  Newman 
50  Moulton  Street 
Cambridge,  MA  02138 


Private  Sector 


1  Dr.  Janice  Gifford 

University  of  Massachusetts 
School  of  Education 
Amherst,  MA  01002 

1  Dr.  Robert  Glaser 
LRDC 

University  of  Pittsburgh 
3939  O’Hara  Street 
Pittsburgh,  PA  15213 

1  Dr.  Bert  Green 

Department  of  Psychology 
Johns  Hopkins  University 
Charles  and  34th  Streets 
Baltimore,  MD  21218 

1  Dr.  Ron  Hambleton 
School  of  Education 
University  of  Massachusetts 
Amherst,  MA  01002 

1  Dr.  Paul  Horst 
677  G  Street,  #184 
Chula  Vista,  CA  90010 

1  Dr.  Lloyd  Humphreys 

Department  of  Psychology 
University  of  Illinois 
Champaign,  IL  61820 

1  Dr.  Jack  Hunter 

2122  Coolidge  Street 
Lansing,  MI  48906 

1  Dr.  Huynh  Huynh 

College  of  Education 
University  of  South  Carolina 
Columbia,  SC  29208 

1  Dr.  Douglas  H.  Jones 
10  Trafalgar  Court 
Lawrenceville,  NJ  08648 


Private  Sector 


1  Prof.  John  A.  Keats 

Department  of  Psychology 
University  of  Newcastle 
Newcastle,  New  South  Uales  2308 
AUSTRALIA 

1  Dr.  William  Koch 

University  of  Texas- Austin 
Measurement  and  Evaluation  Center 
Austin,  TX  78703 

1  Dr.  Pat  Langley 

The  Robotics  Institute 
Carnegie-Mellon  University 
Pittsburgh,  PA  15213 

1  Dr.  Alan  Lesgold 

Learning  R  &  D  Center 
University  of  Pittsburgh 
3939  O'Hara  Street 
Pittsburgh,  PA  15260 

1  Dr.  Michael  Levine 

Department  of  Educational  Psychology 
210  Education  Building 
University  of  Illinois 
Champaign,  IL  61801 

1  Dr.  Charles  Lewis 

Faculteit  Soclale  Wetenschappen 
Ri jksuniversiteit  Croningen 
Oude  Boteringestraat  23 
9712GC  Groningen 
NETHERLANDS 

1  Dr.  Robert  Linn 

College  of  Education 
University  of  Illinois 
Urbana,  IL  61801 

1  Mr.  Phillip  Livingston 

Systems  and  Applied  Sciences  Corporation 
68111  Kenilworth  Avenue 
Rlverdale.  MD  20840 


Private  Sector 


1  Dr.  Robert  Lockman 

Center  for  Naval  Analysis 
200  North  Beauregard  Street 
Alexandria,  VA  22311 

1  Dr.  Frederic  M.  Lord 

Educational  Testing  Service 
Princeton,  NJ  08541 

1  Dr.  James  Lumsden 

Department  of  Psychology 
University  of  Western  Australia 
Nedlands,  Western  Australia  6009 
AUSTRALIA 

1  Dr.  Gary  Marco 
Stop  31-E 

Educational  Testing  Service 
Princeton,  NJ  08541 

1  Dr.  Scott  Maxwell 

Department  of  Psychology 
University  of  Notre  Dame 
Notre  Dame,  IN  46556 

1  Dr.  Samuel  T.  Mayo 

Loyola  University  of  Chicago 
820  North  Michigan  Avenue 
Chicago,  IL  60611 

1  Mr.  Robert  McKinley 

American  College  Testing  Programs 

P.0.  Box  168 

Iowa  City,  IA  52243 

1  Dr.  Robert  Mislevy 
711  Illinois  Street 
Geneva,  IL  60134 

1  Dr.  Allen  Munro 

Behavioral  Technology  Laboratories 
1845  Elena  Avenue,  Fourth  Floor 
Redondo  Beach,  CA  90277 


Private  Sector 


Private  Sector 


1  Dr.  Alan  Nicewander 
University  of  Oklahoma 
Department  of  Psychology 
Oklahoma  City,  OK  73069 

1  Dr.  Donald  A.  Norman 
Cognitive  Science,  C-015 
University  of  California,  San  Diego 
La  Jolla,  CA  92093 

1  Dr.  Melvin  R.  Novick 

356  Lindquist  Center  for  Measurement 
University  of  Iowa 
Iowa  City,  IA  52242 

1  Dr.  James  Olson 
WICAT,  Inc. 

1875  S.  State  Street 
Orem,  UT  84057 

1  Dr.  Wayne  M.  Patience 

American  Council  on  Education 
GED  Testing  Service,  Suite  20 
One  Dupont  Circle,  NW 
Washington,  DC  20036 

1  Dr.  James  A.  Paulson 

Portland  State  University 
P.0 .  Box  751 
Portland,  OR  97207 

1  Dr.  James  W.  Pellegrino 
Univeristy  of  California, 

Santa  Barbara 
Department  of  Psychology 
Santa  Barbara,  CA  93106 

1  Dr.  Mark  D.  Reckase 
ACT 

P.0.  Box  168 

Iowa  City,  IA  52243 

1  Dr.  Lauren  Resnick 
LRDC 

University  of  Pittsburgh 
3939  O'Hara  Street 
Pittsburgh,  PA  15261 


1  Dr.  Thomas  Reynolds 

University  of  Texas,  Dallas 
Marketing  Department 
P.0.  Box  688 
Richardson,  TX  75080 

1  Dr.  Andrew  Rose 

American  Institutes  for  Research 
1055  Thomas  Jefferson  St. ,  NW 
Washington,  DC  20007 

1  Dr.  Ernst  Z.  Rothkopf 
Bell  Laboratories 
Murray  Hill,  NJ  07974 

1  Dr.  Lawrence  Rudner 
403  Elm  Avenue 
Takoma  Park,  MD  20012 

1  Dr .  J .  Ryan 

Department  of  Education 
University  of  South  Carolina 
Columbia,  SC  29208 

1  Prof.  Fumiko  Same jima 
Department  of  Psychology 
University  of  Tennessee 
Knoxville,  TN  37916 

1  Dr.  Walter  Schneider 
Psychology  Department 
603  E.  Daniel 
Champaign,  IL  61820 

1  Dr.  Lowell  Schoer 

Psychological  and  Quantitative 
Foundations 
College  of  Education 
University  of  Iowa 
Iowa  City,  IA  52242 

1  Dr.  Robert  J.  Seidel 

Instructional  Technology  Group 
HUMRR0 

300  N.  Washington  Street 
Alexandria,  VA  22314 


Private  Sector 


Private  Sector 


1  Dr.  Kazuo  Shigemasu 
University  of  Tohoku 
Department  of  Educational  Psychology 
Kawauchi,  Sendai  980 
JAPAN 

1  Dr.  Edwin  Shirkey 

Department  of  Psychology 
University  of  Central  Florida 
Orlando,  FL  32816 

1  Dr.  William  Sims 

Center  for  Naval  Analysis 
200  North  Beauregard  Street 
Alexandria,  VA  22311 

1  Dr.  H.  Wallace  Sinaiko 
Program  Director 

Manpower  Research  and  Advisory  Services 
Smithsonian  Institution 
801  North  Pitt  Street 
Alexandria,  VA  22314 

1  Dr.  Richard  Snow 
School  of  Education 
Stanford  University 
Stanford,  CA  94305 

1  Dr.  Kathryn  T.  Spoehr 
Psychology  Department 
Brown  University 
Providence,  RI  02912 

1  Dr.  Robert  Sternberg 
Department  of  Psychology 
Yale  University 
Box  11A,  Yale  Station 
New  Haven,  CT  06520 

1  Dr.  Peter  Stoloff 

Center  for  Naval  Analysis 
200  North  Beauregard  Street 
Alexandria,  VA  22311 


1  Dr.  William  Stout 

University  of  Illinois 
Department  of  Mathematics 
Urbana,  IL  61801 

1  Dr.  Patrick  Suppes 

Institute  for  Mathematical  Studies 
in  the  Social  Sciences 
Stanford  University 
Stanford,  CA  94305 

1  Dr.  Hariharan  Swaminathan 

Laboratory  of  Psychometric  and 
Evaluation  Research 
School  of  Education 
University  of  Massacuusetts 
Amherst,  MA  01003 

1  Dr.  Kikumi  Tatsuoka 

Computer  Based  Education  Research 
Laboratory 

252  Engineering  Research  Laboratory 
University  of  Illinois 
Urbana,  IL  61801 

1  Dr.  Maurice  Tatsuoka 
220  Education  Building 
1310  S.  Sixth  Street 
Champaign,  II  61820 

1  Dr.  David  Thissen 

Department  of  Psychology 
University  of  Kansas 
Lawrence,  KS  66044 

1  Dr.  Douglas  Towne 

University  of  Southern  California 
Behavioral  Technology  Labs 
1845  S.  Elena  Avenue 
Redondo  Beach,  CA  90277 

1  Dr.  Robert  Tsutakawa 
Department  of  Statistics 
University  of  Missouri 
Columbia,  M0  65201 


Private  Sector 


Private  Sector 


1  Dr.  Rand  R.  Wilcox 

University  of  Southern  California 
Department  of  Psychology 
Los  Angeles,  CA  90007 

1  Dr.  Wolfgang  Wildgrube 
Streitkraef teamt 
Box  20  50  03 
D-5300  Bonn  2 
WEST  GERMANY 

1  Dr.  Bruce  Williams 

Department  of  Educational  Psychology 
University  of  Illinois 
Urbana,  IL  61801 

1  Dr.  Wendy  Yen 
CTB /McGraw-Hill 
Del  Monte  Research  Park 
Monterey,  CA  93940 

1  Dr.  Michael  T.  Waller 

Department  of  Educational  Psychology 
University  of  Wisconsin 
Milwaukee,  WI  53201 

1  Dr.  Brian  Waters 
HUMRRO 

300  North  Washington 
Alexandria,  VA  22314 

1  Dr.  Phyllis  Weaver 
2979  Alexis  Drive 
Palo  Alto,  CA  94304 

1  Dr.  David  J.  Weiss 
N660  Elliott  Hall 
University  of  Minnesota 
75  East  River  Road 
Minneapolis,  MN  55455 

1  Dr.  Keith  T.  Wescourt 
Perceptronics,  Inc. 

545  Middlefield  Road 
Suite  140 

Menlo  Park,  CA  94025 


1  Dr.  V.  R.  R.  Uppuluri 
Union  Carbide  Corporation 
Nuclear  Division 
P.0.  Box  Y 

Oak  Ridge,  TN  37830 

1  Dr.  David  Vale 

Assessment  Systems  Corporation 
2233  University  Avenue 
Suite  310 

St.  Paul,  MN  55114 

1  Dr.  Kurt  Van  Lehn 
Xerox  PARC 

3333  Coyote  Hill  Road 
Palo  Alto,  CA  94304 

1  Dr.  Howard  Wainer 

Educational  Testing  Service 
Princeton,  NJ  08541 
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